-
Notifications
You must be signed in to change notification settings - Fork 40
Add resident device change call #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d53a23f
to
9d7e12c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 / 16 files reviewed (will be continued)
88f672b
to
53ec753
Compare
I still oppose against adding provider specific code to ops interface. note: remember that you will have to call umfMemoryProviderGetPriv() in this function to retrieve ze_memory_provider_t from provider handle |
src/provider/provider_level_zero.c
Outdated
|
||
static void init_ze_global_state(void) { | ||
|
||
char *lib_name = getenv("UMF_ZE_LOADER_LIB_NAME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this thru CTL please.
We created CTL to limit new envvariables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for tests only. Env variables are simple to write and to understand for future maintainers. Could you please accept it as is in this PR? But I am OK if anyone wants to change it to CTL after this PR is merged.
src/provider/provider_level_zero.c
Outdated
|
||
// TODO: add assertions to UMF and change it to be an assertion | ||
if (info->props.base != (void *)key) { | ||
LOG_ERR("key:%p is different than base:%p", (void *)key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG_FATAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, applied
src/utils/utils_log.h
Outdated
#define LOG_DEBUG(...) utils_log(LOG_DEBUG, __func__, __VA_ARGS__); | ||
#define LOG_INFO(...) utils_log(LOG_INFO, __func__, __VA_ARGS__); | ||
#define LOG_WARN(...) utils_log(LOG_WARNING, __func__, __VA_ARGS__); | ||
#define LOG_ERR(...) utils_log(LOG_ERROR, __func__, __VA_ARGS__); | ||
#define LOG_FATAL(...) utils_log(LOG_FATAL, __func__, __VA_ARGS__); | ||
|
||
#define LOG_PDEBUG(...) utils_plog(LOG_DEBUG, __func__, __VA_ARGS__); | ||
#define LOG_PINFO(...) utils_plog(LOG_INFO, __func__, __VA_ARGS__); | ||
#define LOG_PWARN(...) utils_plog(LOG_WARNING, __func__, __VA_ARGS__); | ||
#define LOG_PERR(...) utils_plog(LOG_ERROR, __func__, __VA_ARGS__); | ||
#define LOG_PFATAL(...) utils_plog(LOG_FATAL, __func__, __VA_ARGS__); | ||
#ifdef UMF_DEVELOPER_MODE | ||
#define UMF_STRINGIFY(x) #x | ||
#define UMF_TOSTRING(x) UMF_STRINGIFY(x) | ||
#define UMF_FUNC_DESC() __FILE__ ":" UMF_TOSTRING(__LINE__) | ||
#else | ||
#define UMF_FUNC_DESC() __func__ | ||
#endif | ||
|
||
#define LOG_DEBUG(...) utils_log(LOG_DEBUG, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_INFO(...) utils_log(LOG_INFO, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_WARN(...) utils_log(LOG_WARNING, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_ERR(...) utils_log(LOG_ERROR, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_FATAL(...) utils_log(LOG_FATAL, UMF_FUNC_DESC(), __VA_ARGS__); | ||
|
||
#define LOG_PDEBUG(...) utils_plog(LOG_DEBUG, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PINFO(...) utils_plog(LOG_INFO, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PWARN(...) utils_plog(LOG_WARNING, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PERR(...) utils_plog(LOG_ERROR, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PFATAL(...) utils_plog(LOG_FATAL, UMF_FUNC_DESC(), __VA_ARGS__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate PR.
There are comments which i whould make - but i strongly believe that this discussion should happen in the separate PR to do not block this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/provider/provider_level_zero.c
Outdated
// found | ||
if (is_adding) { | ||
utils_write_unlock(&ze_provider->resident_device_rwlock); | ||
// impossible for UR, should be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho we should not assume that user will not do something then is should be an assertion. If it comes it for any kind of user it should be validated an correct error should be returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, comment removed
src/provider/provider_level_zero.c
Outdated
|
||
uint32_t existing_peer_index = 0; | ||
utils_write_lock(&ze_provider->resident_device_rwlock); | ||
while (existing_peer_index < ze_provider->resident_device_count && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this code whould be much easier to read if it whould be a for loop
for (index = 0; index < count; index++) if (device[index] == index) break;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need for loop index to be available outside the loop. I don't think it is more readable. Left as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do the incrementation within the while loop and have extra &&
to mark the stop, I believe I agree with Łukasz - this would look better as for
.
// you obviously don't have to init anything in for (e.g. for(; index < count; index++)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i noticed that you need it outside of loop - this is why i write
for(index = 0; index < count; index++) if (device[index] == index) break;
not
for (int index = 0; index < count; index++) if (device[index] == index) break;
If something is for loop we should use for not while. If you need access to loop counter outside of function you can declare it before loop - this is a standard approach which should not confuse reader.
const uint32_t new_capacity = | ||
ze_provider->resident_device_capacity * 2 + | ||
1; // +1 to work also with old capacity == 0 | ||
ze_device_handle_t *new_handles = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this code is performance critical, or will be called multiple times
If not i would prefer to have this array just always reallocated by one,
If yes can we abstract this vector like structure to separate module, or even better just use list structure with is included in umf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, not critical, applied
.failed_changes = 0, | ||
}; | ||
|
||
umf_result_t result = umfMemoryTrackerIterateAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is "hacky" and it will not work if someone uses pool without tracker or just use provider without pool.
We allready solved this issue in other providers were we keep track allocation by the provider (for example please check os_provider)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told by you, that I can use tracker. Since it is not a case now that l0 provider is used without tracker, so I am only leaving a comment here that warns about what you noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we miss communicated - I will follow up on this later on.
Great! Using umfLevelZeroMemoryProviderResidentDeviceChange simplified code even more than using ops. Applied with pleasure. |
@lukaszstolarczuk , @bratpiorka , @ldorau , @lplewa , please do the same wrt your comments when you're back |
src/provider/provider_level_zero.c
Outdated
|
||
uint32_t existing_peer_index = 0; | ||
utils_write_lock(&ze_provider->resident_device_rwlock); | ||
while (existing_peer_index < ze_provider->resident_device_count && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do the incrementation within the while loop and have extra &&
to mark the stop, I believe I agree with Łukasz - this would look better as for
.
// you obviously don't have to init anything in for (e.g. for(; index < count; index++)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor commits structure.
commits in PR should be split to functional parts, with proper commit message.
Each commit should work (at least it should build - but we do not check it so exception happens). You should not split implementation, tests, documentation to separate commits.
If i were you i would create(This list is an inspiration for proper commits structure, you do not have split your PR to exactly this commits) : commit for adding iterate over critnib, commit for changes in asserts, maybe commit for this mock library, commit for allowing ovveride lib_name, commit for change in setResidentDevides, commit for main functionality of this PR....
src/provider/provider_level_zero.c
Outdated
|
||
uint32_t existing_peer_index = 0; | ||
utils_write_lock(&ze_provider->resident_device_rwlock); | ||
while (existing_peer_index < ze_provider->resident_device_count && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i noticed that you need it outside of loop - this is why i write
for(index = 0; index < count; index++) if (device[index] == index) break;
not
for (int index = 0; index < count; index++) if (device[index] == index) break;
If something is for loop we should use for not while. If you need access to loop counter outside of function you can declare it before loop - this is a standard approach which should not confuse reader.
LOG_ERR("umfMemoryTrackerIterateAll did not manage to do some change " | ||
"numFailed: %d, numSuccess: %d", | ||
privData.success_changes, privData.failed_changes); | ||
return UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When returining this error code get_last_native_error function should be able to return more precise error message - please add support for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added store_last_native_error(result);
in ze_memory_provider_resident_device_change_helper
.failed_changes = 0, | ||
}; | ||
|
||
umf_result_t result = umfMemoryTrackerIterateAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we miss communicated - I will follow up on this later on.
return 0; | ||
} | ||
|
||
umf_result_t umfLevelZeroMemoryProviderResidentDeviceChange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have add and remove function - i recall pbalcer already suggested it too.
I will split it to 3 functions - add, delete, and helper function to iterate over existing allocations (so we limit code duplication)
Refactor will be squashing all commits to one during merge. |
We do not do squash merge in this project |
ee516a4
to
dbe3b88
Compare
e2b49a4
to
96e69c4
Compare
UMF part of https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_peer_access.asciidoc feature.
UR/llvm part is in intel/llvm#19257